Search CORE

12 research outputs found

Harnessing AI for Speech Reconstruction using Multi-view Silent Video Feed

Author: Beerends John G
Chung Joon Son
Cornu Thomas Le
Lan Yuxuan
Lee Daehyun
Ngiam Jiquan
Pachoud Samuel
Summerfield Quentin
Thiede Thilo
Zimmermann Marina
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 12/08/2018
Field of study

Speechreading or lipreading is the technique of understanding and getting phonetic features from a speaker's visual features such as movement of lips, face, teeth and tongue. It has a wide range of multimedia applications such as in surveillance, Internet telephony, and as an aid to a person with hearing impairments. However, most of the work in speechreading has been limited to text generation from silent videos. Recently, research has started venturing into generating (audio) speech from silent video sequences but there have been no developments thus far in dealing with divergent views and poses of a speaker. Thus although, we have multiple camera feeds for the speech of a user, but we have failed in using these multiple video feeds for dealing with the different poses. To this end, this paper presents the world's first ever multi-view speech reading and reconstruction system. This work encompasses the boundaries of multimedia research by putting forth a model which leverages silent video feeds from multiple cameras recording the same subject to generate intelligent speech for a speaker. Initial results confirm the usefulness of exploiting multiple camera views in building an efficient speech reading and reconstruction system. It further shows the optimal placement of cameras which would lead to the maximum intelligibility of speech. Next, it lays out various innovative applications for the proposed system focusing on its potential prodigious impact in not just security arena but in many other multimedia analytics problems.Comment: 2018 ACM Multimedia Conference (MM '18), October 22--26, 2018, Seoul, Republic of Kore

arXiv.org e-Print Archive

Crossref

The role of aural frequency analysis in pitch perception with simultaneous complex tones

Author: Beerends John G.
Houtsma Adrianus J.M.
Publication venue: Plenum Press
Publication date: 01/01/1986
Field of study

Pitch perception has always been a relatively important issue in psychoacoustic literature. In particular the problem of complex-tone pitch, which does not simply depend on any single spectral frequency, has been the object of much interest during the past century. Since Seebeck (1841) discovered that upper partials contribute significantly to the pitch of complex tones, several mechanisms have been proposed such as nonlinear distortion creating a difference tone (Helmholtz, 1863; Fletcher, 1924), interference between unresolved partials causing a periodic envelope pattern (Schouten, 1940; Plomp, 1967), or some form of central neural processing (Goldstein, 1973; Wightman, 1973; Terhardt, 1972). Most modern pitch theories agree that the pitch of a complex tone is directly or indirectly derived from spectral frequencies which are resolved in the cochlea

Pure OAI Repository

Quantifying sound quality in loudspeaker reproduction

Author: Beerends John G.
van den Broek E.L.
van Nieuwenhuizen Kevin
Publication venue
Publication date: 01/01/2016
Field of study

We present PREQUEL: Perceptual Reproduction Quality Evaluation for Loudspeakers. Instead of quantifying the loudspeaker system itself, PREQUEL quantifies the overall loudspeakers' perceived sound quality by assessing their acoustic output using a set of music signals. This approach introduces a major problem: subjects cannot be provided with an acoustic reference signal and their judgment is based on an unknown, internal, reference. However, an objective perceptual assessment algorithm needs a reference signal in order to be able to predict the perceived sound quality. In this paper, these reference signals are created by making binaural recordings with a head and torso simulator, using the best quality loudspeakers, in the ideal listening spot in the best quality listening environment. The reproduced reference signal with the highest subjective quality is compared to the acoustic degraded loudspeaker output. PREQUEL is developed and, subsequently, validated, using three databases that contain binaurally recorded music fragments played over low to high quality loudspeakers in low to high quality listening rooms. The model shows a high average correlation (0.85) between objective and subjective measurements. PREQUEL thus allows prediction of the subjectively perceived sound quality of loudspeakers taking into account the influence of the listening room and the listening position

Utrecht University Repository

Quantifying sound quality in loudspeaker reproduction

Author: Beerends John G.
van den Broek E.L.
van Nieuwenhuizen Kevin
Publication venue: 'Audio Engineering Society'
Publication date: 25/10/2016
Field of study

Parameter-based speech quality measures for GSM

Author: Beerends John G.
Kamps Karsten
Tuisel Ulrich
Vary Peter
Werner Marc
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/01/2003
Field of study

Publikationsserver der RWTH Aachen University

Mobile Screen Size Limits Multimodal Synergy

Author: Beerends John G.
van den Broek Egon L.
van der Sluis Frans
van Drunen Annemiek
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2018
Field of study

Copenhagen University Research Information System

Enhancing the Quality of Service of mobile video technology by increasing multimodal synergy

Author: Beerends John G.
van den Broek E.L.
van der Sluis Frans
van Drunen Annemiek
Publication venue
Publication date: 01/01/2018
Field of study

Bandwidth is still a limiting factor for the Quality of Service (QoS) of mobile communication applications. In particular, for Voice over IP the QoS is not yet as good as for common, well-engineered, public-switched telephone networks. Multisensory communication has been identified as a possibility to moderate this limitation. One of the strengths of mobile video technology lies in its combination of visual and auditory modalities. However, one of the most salient features of mobile video applications is its small screen size. To test the potential of multimodal synergy for mobile devices, we assessed to what extent small screens affect multimodal synergy. This potential was assessed in an experiment with 54 participants, who conducted a standardised video-listening test for three talking-heads videos with a signal-tonoise ratio of –9 dB. The videos were presented on three different screen sizes, whilst keeping the video and auditory signals equal. Compared to a ground truth based on 359 participants, intelligibility was found to be significantly higher when using a large screen than when using a small screen. This indicates that mobile video technology has the potential for a significant multimodal synergy to which screen size is a substantial constraint. To optimally benefit from their multimodal potential, we offer suggestions on how to increase the effective screen size for small screen (e.g. mobile) devices and applications through elaborating the most relevant (visual) features. We conclude that knowledge about human sensory processing can alleviate the identified constraint and maximise the potential QoS of mobile video technology

Copenhagen University Research Information System

Utrecht University Repository

Perceptual Evaluation Of Speech Quality (pesq) -- A

Author: Andries P. Hekstra
Antony W. Rix
John G. Beerends
Michael P. Hollier
New Method For
Publication venue
Publication date
Field of study

Previous objective speech quality assessment models, such as bark spectral distortion (BSD), the perceptual speech quality measure (PSQM), and measuring normalizing blocks (MNB), have been found to be suitable for assessing only a limited range of distortions. A new model has therefore been developed for use across a wider range of network conditions, including analogue connections, codecs, packet loss and variable delay

CiteSeerX